AITopics | dueling bandit algorithm

Collaborating Authors

dueling bandit algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal, Shivani Agarwal

Neural Information Processing SystemsApr-22-2026, 14:34:06 GMT

Recent work on deriving O(log T) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(log T) anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.28)
Europe (0.28)

Industry:

Government > Voting & Elections (0.64)
Leisure & Entertainment > Sports > Tennis (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

dd0ef9308bd10c964bd14c0000438460-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 09:37:33 GMT

data mining, evolutionary algorithm, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Data Science > Data Mining (0.68)
(2 more...)

Add feedback

Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsFeb-10-2026, 16:31:00 GMT

We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits

Neural Information Processing SystemsOct-10-2025, 18:46:36 GMT

SOI by directly leveraging human feedback without being restricted by a predefined reward model nor cumbersome model selection.

algorithm, dueling bandit algorithm, experiment, (12 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback

Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing SystemsAug-16-2025, 18:01:27 GMT

We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL.

algorithm, complexity, trajectory, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d9d3837ee7981e8c064774da6cdd98bf-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 18:01:20 GMT

algorithm, complexity, pbrl, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Neural Information Processing SystemsMar-12-2024, 19:59:23 GMT

algorithm, copeland, tournament solution, (11 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit

Huang, Tian, Li, Ke

arXiv.org Artificial IntelligenceNov-23-2023

Optimization problems find widespread use in both single-objective and multiobjective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems. In optimization problems, algorithms typically converge to the Pareto front (PF), yet users aim for convergence in their specific region of interest (ROI).

algorithm, direct preference-based evolutionary multi-objective optimization, rucb-al, (10 more...)

arXiv.org Artificial Intelligence

2311.14003

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Devon > Exeter (0.04)
Europe > Denmark (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Preference-based Reinforcement Learning with Finite-Time Guarantees

Xu, Yichong, Wang, Ruosong, Yang, Lin F., Singh, Aarti, Dubrawski, Artur

arXiv.org Artificial IntelligenceJun-15-2020

Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2006.0891

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Ramamohan, Siddartha Y., Rajkumar, Arun, Agarwal, Shivani, Agarwal, Shivani

Neural Information Processing SystemsDec-31-2016

Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country: